Skip to content

HBASE-30036 Skip redundant delete markers during flush and minor compaction#7993

Open
junegunn wants to merge 1 commit intoapache:masterfrom
junegunn:HBASE-30036
Open

HBASE-30036 Skip redundant delete markers during flush and minor compaction#7993
junegunn wants to merge 1 commit intoapache:masterfrom
junegunn:HBASE-30036

Conversation

@junegunn
Copy link
Copy Markdown
Member

@junegunn junegunn commented Mar 27, 2026

https://issues.apache.org/jira/browse/HBASE-30036

This PR makes rudundant delete markers for the same row or column consolidated during flushes and minor compactions.

Test result

Consolidation of contiguous DeleteColumn markers

image

Consolidation of DeleteFamily markers (which are inherently contiguous)

image

Description

Add DeleteTracker.isRedundantDelete() to detect when a delete marker is already covered by a previously tracked delete of equal or broader scope. ScanDeleteTracker implements this for all four delete types:

  • DeleteFamily/DeleteFamilyVersion: covered by a tracked DeleteFamily
  • DeleteColumn/Delete: covered by a tracked DeleteFamily or DeleteColumn

MinorCompactionScanQueryMatcher calls this check before including a delete marker, returning SEEK_NEXT_COL to skip past all remaining cells covered by the previously tracked delete.

Compatible with KEEP_DELETED_CELLS. When set to TRUE, trackDelete() does not populate the delete tracker, so isRedundantDelete() always returns false and all markers are retained.

…action

Add DeleteTracker.isRedundantDelete() to detect when a delete marker is
already covered by a previously tracked delete of equal or broader scope.
ScanDeleteTracker implements this for all four delete types:
- DeleteFamily/DeleteFamilyVersion: covered by a tracked DeleteFamily
- DeleteColumn/Delete: covered by a tracked DeleteFamily or DeleteColumn

MinorCompactionScanQueryMatcher calls this check before including a
delete marker, returning SEEK_NEXT_COL to skip past all remaining cells
covered by the previously tracked delete.

Compatible with KEEP_DELETED_CELLS. When set to TRUE, trackDelete() does
not populate the delete tracker, so isRedundantDelete() always returns
false and all markers are retained.
junegunn added a commit to junegunn/hbase that referenced this pull request Mar 29, 2026
HBASE-30036 (apache#7993) consolidates redundant delete markers on flush,
preventing them from growing unbounded in HFiles. However, markers still
accumulate in the memstore before flush, degrading read performance.
HBASE-29039 addresses this from the read path side. Both are needed for
full coverage. There is an open PR (apache#6557), but the review process has
been stalled. This is an alternative approach with fewer code changes,
hopefully making it easier to reach consensus.

When a DeleteColumn or DeleteFamily marker is encountered during a normal
user scan, the matcher currently returns SKIP, forcing the scanner to
advance one cell at a time. This causes read latency to degrade linearly
with the number of accumulated delete markers for the same row or column.

Since these are range deletes that mask all remaining versions of the
column, seek past the entire column immediately via
columns.getNextRowOrNextColumn(). This is safe because cells arrive in
timestamp descending order, so any puts newer than the delete have
already been processed.

For DeleteFamily, also fix getKeyForNextColumn in ScanQueryMatcher to
bypass the empty-qualifier guard (HBASE-18471) when the cell is a
DeleteFamily marker. Without this, the seek barely advances past the
current cell instead of jumping to the first real qualified column.

The optimization is skipped when:
- seePastDeleteMarkers is true (KEEP_DELETED_CELLS)
- newVersionBehavior is enabled (sequence IDs determine visibility)
- the delete marker is not tracked (visibility labels)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant